AITopics | non-markovian decision process

Collaborating Authors

non-markovian decision process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Offline Reinforcement Learning for Non-Markovian Decision Processes

Huang, Ruiquan, Liang, Yingbin, Yang, Jing

arXiv.org Machine LearningJan-4-2025

Distributionally robust offline reinforcement learning (RL) aims to find a policy that performs the best under the worst environment within an uncertainty set using an offline dataset collected from a nominal model. While recent advances in robust RL focus on Markov decision processes (MDPs), robust non-Markovian RL is limited to planning problem where the transitions in the uncertainty set are known. In this paper, we study the learning problem of robust offline non-Markovian RL. Specifically, when the nominal model admits a low-rank structure, we propose a new algorithm, featuring a novel dataset distillation and a lower confidence bound (LCB) design for robust values under different types of the uncertainty set. We also derive new dual forms for these robust values in non-Markovian RL, making our algorithm more amenable to practical implementation. By further introducing a novel type-I concentrability coefficient tailored for offline low-rank non-Markovian decision processes, we prove that our algorithm can find an $\epsilon$-optimal robust policy using $O(1/\epsilon^2)$ offline samples. Moreover, we extend our algorithm to the case when the nominal model does not have specific structure. With a new type-II concentrability coefficient, the extended algorithm also enjoys polynomial sample efficiency under all different types of the uncertainty set.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2411.07514

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

[D] How to deal with non-Markovian decision processes with large/infinite horizon using MCTS? • r/MachineLearning

@machinelearnbotJan-2-2018, 14:30:52 GMT

Quick google search will tell you that MCTS is applicable to large/infinite horizon RL tasks. But it seems that there's no empirical confirmation that it works as well as on Go. Assume that no rollout is used just as in AlphaZero. Go's state space is larger than other games, but its horizon length is small (not much larger than 100 timesteps). The state space of many real-world problems grows exponentially w.r.t. the timestep in the following sense.

artificial intelligence, non-markovian decision process, social media, (7 more...)

@machinelearnbot

Industry:

Leisure & Entertainment > Games (0.41)
Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback